STATS 32: Introduction to R for Undergraduates

Elena Tuzhilina

Sep 21, 2021

http://web.stanford.edu/~elenatuz/courses/stats32-aut2021/

Instructor

Elena Tuzhilina

Agenda for today

The big data explosion



What is R?

Ross Ihaka & Rob Gentleman

Why learn R?

Reason #1: R was specifically designed for statistics and data analysis.

Example: Map of 2016 U.S. presidential elections

Example: Spotify Top 100 Songs in 2017

Why learn R?

(Source: stack overflow)

Why learn R?

Reason #3: It’s easy to get started with R.

Stack Overflow

Q&A site for programmers

Packages

Why learn R?

Reason #4: Analyses done in R are reproducible.

R script

R script

# load packages and get dataset
library(ggplot2)
data(mtcars)

# plot of miles per gallon vs. horsepower, colored by no. of cylinders
ggplot(data = mtcars, aes(x = hp, y = mpg, col = factor(cyl))) +
    geom_point() +
    labs(title = "Miles per gallon vs. horsepower")

R markdown: input

R markdown: output

Why learn R?

Reason #5: Community

How about you?

In the next minute, introduce yourself to someone around you!

Course objectives

By the end of this course, students will be able to:

Tentative overview of the course

Week 1: Introduction to RStudio, basic objects in R

Week 2: Data visualization

Week 3: Transforming and cleaning data

Week 4: Importing and publishing data

Week 5: Applications (map-making, data modeling)

Class logistics

Bring your laptop to class!

Assignments

Installing R to your computer

Please follow the instructions on http://web.stanford.edu/~elenatuz/courses/stats32-aut2021/lectures.html

What is a variable?

x <- 3
x <- 3
x <- 3
y <- "abc"
x <- 3
y <- "abc"
x <- 3
y <- "abc"
y <- 5
x <- 3
y <- "abc"
y <- 5
x <- 3
y <- "abc"
y <- 5
x <- y
x <- 3
y <- "abc"
y <- 5
x <- y
x + y   # 5 + 5 = 10

Variable types

Confusion: 123 vs. “123”

How to differentiate between numeric variables and character variables which consist of digits?









Optional material

List of useful packages

Other fun R stuff

R-bloggers

Blog aggregator of content contributed by bloggers who write about R

The R Journal

Bi-annual open-access journal: Features short to medium length articles covering topics of interest to R users and developers

R-exercises

Website with both tutorials and exercises

DataCamp

Website for learning data science, R included (some courses free, some not)